Equilibrated adaptive learning rates for non-convex optimization

نویسندگان

  • Yann Dauphin
  • Harm de Vries
  • Yoshua Bengio
چکیده

Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the critical points encountered when training such networks are saddle points, we find how considering the presence of negative eigenvalues of the Hessian could help us design better suited adaptive learning rate schemes. We show that the popular Jacobi preconditioner has undesirable behavior in the presence of both positive and negative curvature, and present theoretical and empirical evidence that the socalled equilibration preconditioner is comparatively better suited to non-convex problems. We introduce a novel adaptive learning rate scheme, called ESGD, based on the equilibration preconditioner. Our experiments show that ESGD performs as well or better than RMSProp in terms of convergence speed, always clearly improving over plain stochastic gradient descent.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RMSProp and equilibrated adaptive learning rates for non-convex optimization

Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the critical points encountered when training such networks are saddle points, we find how considering the presence of negative eigenvalues of the Hessian could help u...

متن کامل

An Intelligent Approach Based on Meta-Heuristic Algorithm for Non-Convex Economic Dispatch

One of the significant strategies of the power systems is Economic Dispatch (ED) problem, which is defined as the optimal generation of power units to produce energy at the lowest cost by fulfilling the demand within several limits. The undeniable impacts of ramp rate limits, valve loading, prohibited operating zone, spinning reserve and multi-fuel option on the economic dispatch of practical p...

متن کامل

Sparse Regularized Deep Neural Networks For Efficient Embedded Learning

Deep learning is becoming more widespread in its application due to its power in solving complex classification problems. However, deep learning models often require large memory and energy consumption, which may prevent them from being deployed effectively on embedded platforms, limiting their applications. This work addresses the problem by proposing methods Weight Reduction Quantisation for ...

متن کامل

Algorithmic Connections between Active Learning and Stochastic Convex Optimization

Interesting theoretical associations have been established by recent papers between the fields of active learning and stochastic convex optimization due to the common role of feedback in sequential querying mechanisms. In this paper, we continue this thread in two parts by exploiting these relations for the first time to yield novel algorithms in both fields, further motivating the study of the...

متن کامل

Unifying Stochastic Convex Optimization and Active Learning

First order stochastic convex optimization is an extremely well-studied area with a rich history of over a century of optimization research. Active learning is a relatively newer discipline that grew independently of the former, gaining popularity in the learning community over the last few decades due to its promising improvements over passive learning. Over the last year, we have uncovered co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015